feat: Add Modular Pipeline for Stable Diffusion 3 (SD3) by AlanPonnachan · Pull Request #13324 · huggingface/diffusers

AlanPonnachan · 2026-03-24T17:33:51Z

What does this PR do?

This PR introduces the modular architecture for Stable Diffusion 3 (SD3), implementing both Text-to-Image (T2I) and Image-to-Image (I2I) pipelines.

Key additions:

Added SD3ModularPipeline and SD3AutoBlocks to the dynamic modular pipeline resolver.
Migrated SD3-specific mechanics to the new BlockState
Added corresponding dummy objects and lazy-loading fallbacks.
Added TestSD3ModularPipelineFast and TestSD3Img2ImgModularPipelineFast test suites.

Related issue: #13295

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Usage Example

import torch
from IPython.display import display
from diffusers import ComponentsManager
from diffusers.modular_pipelines.stable_diffusion_3 import StableDiffusion3ModularPipeline, StableDiffusion3AutoBlocks
from diffusers.utils import load_image

from diffusers import FlowMatchEulerDiscreteScheduler, SD3Transformer2DModel, AutoencoderKL
from diffusers.guiders import ClassifierFreeGuidance
from diffusers.image_processor import VaeImageProcessor
from transformers import CLIPTokenizer, CLIPTextModelWithProjection

components = ComponentsManager()
components.enable_auto_cpu_offload(device="cuda")

# Instantiate the Modular Pipeline 
blocks = StableDiffusion3AutoBlocks()
pipeline = StableDiffusion3ModularPipeline(blocks=blocks, components_manager=components)

repo_id = "stabilityai/stable-diffusion-3-medium-diffusers"
print("Loading components...")

# Load ONLY CLIP tokenizers
tokenizer = CLIPTokenizer.from_pretrained(repo_id, subfolder="tokenizer")
tokenizer_2 = CLIPTokenizer.from_pretrained(repo_id, subfolder="tokenizer_2")

# Load diffusers components
scheduler = FlowMatchEulerDiscreteScheduler.from_pretrained(repo_id, subfolder="scheduler")
guider = ClassifierFreeGuidance.from_config({"guidance_scale": 7.0})
image_processor = VaeImageProcessor(vae_scale_factor=8, vae_latent_channels=16)

# Load ONLY CLIP text encoders
text_encoder = CLIPTextModelWithProjection.from_pretrained(repo_id, subfolder="text_encoder", torch_dtype=torch.float16)
text_encoder_2 = CLIPTextModelWithProjection.from_pretrained(repo_id, subfolder="text_encoder_2", torch_dtype=torch.float16)

# Load Transformer and VAE
transformer = SD3Transformer2DModel.from_pretrained(repo_id, subfolder="transformer", torch_dtype=torch.float16)
vae = AutoencoderKL.from_pretrained(repo_id, subfolder="vae", torch_dtype=torch.float16)

# Inject components directly into the pipeline
pipeline.update_components(
    tokenizer=tokenizer,
    tokenizer_2=tokenizer_2,
    tokenizer_3=None,    # Dropped to prevent OOM
    scheduler=scheduler,
    guider=guider,
    image_processor=image_processor,
    text_encoder=text_encoder,
    text_encoder_2=text_encoder_2,
    text_encoder_3=None, # Dropped to prevent OOM
    transformer=transformer,
    vae=vae
)

print("Components loaded successfully! Memory saved.")


# TEXT-TO-IMAGE 

prompt = "A highly detailed macro photography of a glowing bioluminescent blue butterfly resting on a vibrant red rose, dark magical forest background, cinematic lighting, 8k resolution, masterpiece"

print("Running Text-to-Image...")
t2i_output = pipeline(
    prompt=prompt,
    num_inference_steps=28,
    guidance_scale=7.0,
    generator=torch.manual_seed(42)
)
t2i_output.images[0].save("sd3_modular_t2i.png")
print("Saved sd3_modular_t2i.png")
display(t2i_output.images[0])


# IMAGE-TO-IMAGE 

init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png").resize((1024, 1024))

prompt_i2i = "A beautiful classic impressionist oil painting of a cat looking at the camera, thick expressive brushstrokes, vibrant colors, museum masterpiece"

print("Running Image-to-Image...")
i2i_output = pipeline(
    prompt=prompt_i2i,
    image=init_image,
    strength=0.8,
    num_inference_steps=28,
    guidance_scale=7.0,
    generator=torch.manual_seed(42)
)
i2i_output.images[0].save("sd3_modular_i2i.png")
print("Saved sd3_modular_i2i.png")
display(i2i_output.images[0])

Colab notebook: https://colab.research.google.com/drive/18_tZWIQdObq8EX0Vyd9ysGA-oACDwpf8?usp=sharing

Outputs

Text-to-Image:

Image-to-Image:

Who can review?

@sayakpaul @asomoza

sayakpaul · 2026-03-25T02:24:45Z

@AlanPonnachan thanks for this PR! Could you also provide some test code and sample outputs?

sayakpaul

Thanks for getting started on this! I left some comments (majorly on the use of guidance).

sayakpaul · 2026-03-28T05:27:27Z

@claude can you review this?

claude · 2026-03-28T05:27:43Z

Claude Code is working…

I'll analyze this and get back to you.

View job run

sayakpaul · 2026-03-28T05:28:02Z

@bot /style

github-actions · 2026-03-28T05:28:27Z

Style bot fixed some files and pushed the changes.

HuggingFaceDocBuilderDev · 2026-03-28T05:41:47Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

AlanPonnachan · 2026-03-29T06:18:34Z

@sayakpaul
test_modular_pipeline_stable_diffusion_3.py tests are passing.

Sample outputs you can find here: #13324 (comment)

yiyixuxu

thanks for working on this!
I left one comment

- add autodocstring to assembled blocks

yiyixuxu · 2026-04-01T20:10:34Z

+logger = logging.get_logger(__name__)
+
+
+# auto_docstring


i added a doc page on this here #13382
basically you need to run

python utils/modular_auto_docstring.py --fix_and_overwrite

and to look through the generated docstring to see if all the paramters are properly defined

@yiyixuxu , I added description to most of InputParam and OutputParam and ran the above script.
I skimmed throught the docstrings once and felt right .
Let me know your thoughts!

yiyixuxu · 2026-04-01T20:11:04Z

@claude
can you do a review here?

github-actions · 2026-04-01T20:11:20Z

Claude Code is working…

I'll analyze this and get back to you.

View job run

yiyixuxu

thanks, i left some comments

Create .ai/modular.md as a shared reference for modular pipeline conventions, patterns, and common mistakes — parallel to the existing models.md for model conventions. Consolidates content from the former modular-conversion.md skill file and adds gotchas identified from reviewing recent modular pipeline PRs (LTX #13378, SD3 #13324). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2. refactored encoders and inputs 3. refactored for more flat structure 4. styling

yiyixuxu

thanks!

yiyixuxu · 2026-04-15T20:29:59Z

@bot /style

github-actions · 2026-04-15T20:30:32Z

Style bot fixed some files and pushed the changes.

yiyixuxu · 2026-04-15T21:40:32Z

can you look into the CI failures?

sayakpaul · 2026-04-16T09:50:00Z

@claude could you review the PR?

github-actions · 2026-04-16T09:50:20Z

Claude finished @sayakpaul's task in 3m 30s —— View job

Review of SD3 Modular Pipeline

Read review guidelines (AGENTS.md, modular.md, etc.)
Read all changed files under src/diffusers/
Analyze code for correctness, conventions, and issues
Post review feedback

Overall the structure follows the modular pipeline conventions well — the flat block layout requested by @yiyixuxu is in place, # Copied from headers are used correctly, and the AutoPipelineBlocks/SequentialPipelineBlocks hierarchy is clean. The registration files (__init__.py, dummy objects, pipeline mapping) all look correct.

That said, I found several issues that should be addressed before merge:

Issue 1 (High): `guidance_scale` and related params should not be pipeline inputs

Files: denoise.py:93-117, modular_blocks_stable_diffusion_3.py (docstrings)

The StableDiffusion3LoopDenoiser accepts guidance_scale, skip_layer_guidance_scale, skip_layer_guidance_start, and skip_layer_guidance_stop as InputParams, and then manually sets them on the guider:

# denoise.py:138-145
if hasattr(components.guider, "guidance_scale"):
    components.guider.guidance_scale = block_state.guidance_scale

This violates the modular pipeline convention (see modular.md gotcha #3): guidance parameters should be configured on the guider directly by the user, not forwarded through the pipeline. No other modular pipeline (flux2, ltx, helios) does this. Remove these four InputParams and the hasattr/setattr lines.

Fix this →

Issue 2 (High): Text encoder accepts pre-computed embeds (dead code + convention violation)

File: encoders.py:496-517

StableDiffusion3TextEncoderStep.inputs declares prompt_embeds, negative_prompt_embeds, pooled_prompt_embeds, and negative_pooled_prompt_embeds as inputs — but the __call__ method never checks or uses them. It always calls encode_prompt(). This is both dead code and a violation of modular.md gotcha #4: in modular pipelines, users skip encoding by removing the encoder block, not by passing pre-computed embeds.

Remove the four pre-computed embed InputParams from StableDiffusion3TextEncoderStep.inputs.

Fix this →

Issue 3 (Medium): Missing `cache_context` in denoise loop

File: denoise.py:150-163

The denoising loop calls components.transformer(...) without wrapping it in components.transformer.cache_context(context_name). Other modular pipelines that use the guider (ltx, helios) use cache_context to enable KV-cache reuse across conditional/unconditional passes:

# Reference pattern (ltx/denoise.py:136-148):
context_name = getattr(guider_state_batch, components.guider._identifier_key, None)
with components.transformer.cache_context(context_name):
    guider_state_batch.noise_pred = components.transformer(...)

Without this, guider-based KV caching won't work. Add the cache_context wrapper and retrieve context_name from the guider state.

Fix this →

Issue 4 (Medium): Test model path points to contributor repo

File: test_modular_pipeline_stable_diffusion_3.py:47,123

pretrained_model_name_or_path = "AlanPonnachan/tiny-sd3-modular"

Per modular.md gotcha #9, tiny test models must live under hf-internal-testing/, not personal repos. This needs to be transferred before merge.

Issue 5 (Medium): Wrong return type hint on `inputs` property

File: denoise.py:57

@property
def inputs(self) -> list[tuple[str, Any]]:

This should be list[InputParam] to match what is actually returned. Similarly, inputs.py:157 declares intermediate_outputs -> list[str] but returns list[OutputParam]. Both are incorrect type annotations.

Issue 6 (Low): `encode_prompt` always encodes negative prompts

File: encoders.py:252-299

The encode_prompt function always computes negative prompt embeddings (defaulting negative_prompt to ""). In the modular pipeline world, the guider is responsible for deciding whether negative conditioning is needed. If the user isn't using CFG, this is wasted compute. Consider making negative encoding conditional on whether a negative prompt was actually provided (i.e., not defaulting to empty string).

Issue 7 (Low): `Img2ImgPrepareLatentsStep` output naming

File: before_denoise.py:410-418

The StableDiffusion3Img2ImgPrepareLatentsStep declares initial_noise as its output, but per modular.md gotcha #8, the OutputParam.template("latents") template means "Denoised latents" which doesn't apply here. The current custom OutputParam("initial_noise", ...) is correct — but note that the block modifies block_state.latents in-place (line 425-427) without declaring latents as an output. This could be confusing since latents is the actually important output (the noised latents used for denoising), while initial_noise just stores the original noise for reference.

Minor nits

decoders.py:39 — The VaeImageProcessor config has "vae_scale_factor": 8 but doesn't include "vae_latent_channels": 16, while encoders.py:328 does. These should be consistent for SD3 (which uses 16 latent channels).
inputs.py:241 — Mutable default argument additional_batch_inputs: list[str] = []. Use None with a default in the body, or a tuple, to avoid the mutable default gotcha.

What looks good

Flat block structure with Auto at the top level and flat SequentialPipelineBlocks per workflow — matches the recommended pattern
# Copied from headers used correctly for cross-pipeline utilities (retrieve_timesteps, calculate_shift, retrieve_latents)
# auto_docstring present on all assembled blocks
Clean separation of concerns: image processing, VAE encoding, text encoding, timestep setup, latent prep, denoise, and decode are all separate blocks
Registration in __init__.py, dummy objects, and MODULAR_PIPELINE_MAPPING all look correct
_workflow_map on StableDiffusion3AutoBlocks is a nice touch for discoverability

AlanPonnachan added 3 commits March 23, 2026 15:53

initial architecture

f944a64

add blocks to various inits

08d14c6

styling

0a81741

AlanPonnachan commented Mar 24, 2026

View reviewed changes

Comment thread tests/modular_pipelines/stable_diffusion_3/test_modular_pipeline_stable_diffusion_3.py Outdated

sayakpaul requested review from asomoza and yiyixuxu March 25, 2026 02:22

sayakpaul reviewed Mar 25, 2026

View reviewed changes

AlanPonnachan added 3 commits March 26, 2026 15:09

push tiny-sd3-modular to hub and fix the tests

ba29383

rename modules

ad15c9d

guidance refactoring

02bb2af

Apply style fixes

24618de

set default height and width

27cb9f7

Merge branch 'main' into feat/sd3-modular-pipeline

651abb9

yiyixuxu reviewed Mar 30, 2026

View reviewed changes

Comment thread src/diffusers/modular_pipelines/stable_diffusion_3/modular_blocks_stable_diffusion_3.py Outdated

Merge branch 'main' into feat/sd3-modular-pipeline

5ad8713

yiyixuxu reviewed Mar 30, 2026

View reviewed changes

Comment thread src/diffusers/modular_pipelines/stable_diffusion_3/denoise.py Outdated

AlanPonnachan and others added 2 commits April 1, 2026 17:43

- skip layer refactoring

6fd0f30

- add autodocstring to assembled blocks

Merge branch 'main' into feat/sd3-modular-pipeline

08552e6

AlanPonnachan requested a review from yiyixuxu April 1, 2026 17:53

yiyixuxu reviewed Apr 1, 2026

View reviewed changes

add description and run autostring script

956f363

Merge branch 'main' into feat/sd3-modular-pipeline

8acca5a

yiyixuxu reviewed Apr 3, 2026

View reviewed changes

AlanPonnachan added 2 commits April 5, 2026 05:48

1. fix imports

0417997

2. refactored encoders and inputs 3. refactored for more flat structure 4. styling

fix dtype

3c8efd0

AlanPonnachan requested a review from yiyixuxu April 5, 2026 06:33

Merge branch 'main' into feat/sd3-modular-pipeline

4586962

github-actions bot added tests modular-pipelines utils size/L PR with diff > 200 LOC labels Apr 7, 2026

Merge branch 'main' into feat/sd3-modular-pipeline

c3c0f3b

github-actions bot added size/L PR with diff > 200 LOC and removed size/L PR with diff > 200 LOC labels Apr 9, 2026

yiyixuxu approved these changes Apr 15, 2026

View reviewed changes

yiyixuxu added the close-to-merge label Apr 15, 2026

Apply style fixes

df672a0

github-actions bot added size/L PR with diff > 200 LOC and removed size/L PR with diff > 200 LOC labels Apr 15, 2026

Merge branch 'main' into feat/sd3-modular-pipeline

bbc919e

github-actions bot added size/L PR with diff > 200 LOC and removed size/L PR with diff > 200 LOC labels Apr 16, 2026

fix ci failures

2510dba

github-actions bot added size/L PR with diff > 200 LOC and removed size/L PR with diff > 200 LOC labels Apr 16, 2026

Conversation

AlanPonnachan commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Usage Example

Outputs

Text-to-Image:

Image-to-Image:

Who can review?

Uh oh!

Uh oh!

sayakpaul commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sayakpaul commented Mar 28, 2026

Uh oh!

claude bot commented Mar 28, 2026

Uh oh!

sayakpaul commented Mar 28, 2026

Uh oh!

github-actions bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Mar 28, 2026

Uh oh!

AlanPonnachan commented Mar 29, 2026

Uh oh!

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yiyixuxu Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

AlanPonnachan Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

yiyixuxu commented Apr 1, 2026

Uh oh!

github-actions bot commented Apr 1, 2026

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

yiyixuxu commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yiyixuxu commented Apr 15, 2026

Uh oh!

sayakpaul commented Apr 16, 2026

Uh oh!

github-actions bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review of SD3 Modular Pipeline

Issue 1 (High): guidance_scale and related params should not be pipeline inputs

AlanPonnachan commented Mar 24, 2026 •

edited

Loading

sayakpaul commented Mar 25, 2026 •

edited

Loading

github-actions bot commented Mar 28, 2026 •

edited

Loading

github-actions bot commented Apr 15, 2026 •

edited

Loading

github-actions bot commented Apr 16, 2026 •

edited

Loading

Issue 1 (High): `guidance_scale` and related params should not be pipeline inputs

Issue 3 (Medium): Missing `cache_context` in denoise loop

Issue 5 (Medium): Wrong return type hint on `inputs` property

Issue 6 (Low): `encode_prompt` always encodes negative prompts

Issue 7 (Low): `Img2ImgPrepareLatentsStep` output naming